Estimating Conditional Densities from Sparse Data for Statistical Language Modeling

نویسندگان

  • Damianos Karakos
  • Sanjeev Khudanpur
چکیده

The Maximum Likelihood Set (MLS) was recently introduced in [1] as an effective, parameter-free technique for estimating a probability mass function (pmf) from sparse data. The MLS contains all pmfs that assign merely a higher likelihood to the observed counts than to any other set of counts, for the same sample size. In this paper, the MLS is extended to the case of conditional density estimation. First, it is shown that, when the criterion for selecting a pmf from the MLS is the KL-divergence, the selected conditional pmf naturally has a back-off form, except for a ceiling on the probability of high frequency unigrams that are not seen in particular contexts. Second, the pmf has a sparse parameterization, leading to efficient algorithms for KLdivergence minimization. Finally, a novel fattening of the MLS, called the High Likelihood Set (HLS) is introduced. It contains the MLS, and some neighboring pmfs. Experimental results from bigram and trigram estimation indicate that pmfs selected from the HLS are competitive with state-of-the-art estimates. I. THE DENSITY ESTIMATION PROBLEM The problem of probability density estimation may be formulated as follows: a sequence W = {w1, . . . , wN} of independent samples, drawn according to an unknown probability mass function (pmf) PTrue is observed, and the goal is to estimate PTrue. It is assumed that the samples wj belong to a discrete and finite set V = {1, . . . , V }. To facilitate a more concrete exposition, think of V as the vocabulary of a statistical language model (LM), and W as the training corpus. This estimation problem is, of course, a recurring problem not only in natural language processing (NLP) but indeed in all of statistics. A popular estimate of PTrue is the maximum likelihood estimate,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Total Ratio of Vegetation Index (TRVI) for Shrubs Sparse Cover Delineating in Open Woodland

Persian juniper and Pistachio are grown in low density in the rangelands of North-East of Iran. These rangelands are populated by evergreen conifers, which are widespread and present at low-density and sparse shrub of pistachio in Iran, that are not only environmentally but also genetically essential as seed sources for pistachio improvement in orchards. Rangelands offer excellent opportunities...

متن کامل

Behavioral Foundations for Conditional Markov Models of Aggregate Data

Conditional Markov chain models of observed aggregate share–type data have been used by economic researchers for several years, but the classes of models commonly used in practice are often criticized as being purely ad hoc because they are not derived from micro–behavioral foundations. The primary purpose of this paper is to show that the estimating equations commonly used to estimate these co...

متن کامل

Estimating Conditional Probability Densities for Periodic Variables

Most of the common techniques for estimating conditional probability densities are inappropriate for applications involving periodic variables. In this paper we introduce three novel techniques for tackling such problems, and investigate their performance using synthetic data. We then apply these techniques to the problem of extracting the distribution of wind vector directions from radar scatt...

متن کامل

Factored Language Models Tutorial

The Factored Language Model (FLM) is a flexible framework for incorporating various information sources, such as morphology and part-of-speech, into language modeling. FLMs have so far been successfully applied to tasks such as speech recognition and machine translation; it has the potential to be used in a wide variety of problems in estimating probability tables from sparse data. This tutoria...

متن کامل

Self-organized Language Modeling for Speech Recognition. Estimation of Probabilities from Sparse Data for the Lan- Guage Model Component of a Speech Recognizer

Word sense disambiguation using statistical models of Ro-get's categories trained on large corpora. A method for dis-ambiguating word senses in a large corpus. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of En-glish bigrams. 36 Ido Dagan and Alon Itai Word Sense Disambiguation comments, which resulted in additional discussions and clariic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006